Replace remaining real-domain test hosts with .test#5189
Merged
Conversation
Follow-up to #5125 ("Replace bar.com with bar.test in tests"), which only swept bar.com. The same footgun lived in many other test fixtures using real-resolving .com domains in `Config.Host`, `databricks.yml`'s `workspace.host`, and `.databrickscfg`: foo.com, test.com, test2.com, myhost.com, myworkspace.com, x.com, a.com, www.host[12].com, other.host.com. Why this surfaced now --------------------- The SDK well-known endpoint resolver (added during an earlier SDK bump) performs an HTTP fetch with a retry loop the first time `Config.EnsureResolved` runs. For non-resolving hosts (`.test`, `.invalid`, etc.) the resolver fails fast. For real domains it depends on the network: a quick TCP RST or 404 also returns fast; a slow handshake or silent drop triggers a 5-minute retry window. Timeline of `task test (linux, direct)` job duration: | Date | SHA | Duration | Notable | |--------|-----------|----------|-------------------------------| | Apr 16 | 4909238 | 497s | SDK 0.127.0 bump | | Apr 21 | f2443de | 466s | SDK 0.128.0 bump | | Apr 28 | 987c2d9 | 494s | last fast run | | Apr 29 | 4ad6ba3 | 1077s | #5125 (bar.com -> bar.test) | | Apr 30 | 22be781 | 1205s | +10min vs baseline | | May 6 | 9fc1f8d | 1178s | | The SDK bumps did not move the needle. The +10min jump appeared on Apr 29, which lines up with the GitHub Actions runner network changing behavior toward these external domains - most likely a firewall or egress-filtering change that stopped fast-failing the lookups, turning every test that set a real-domain `Config.Host` into a 5-minute stall. #5125 fixed the bar.com family and recovered most of the regression. foo.com lived in a separate file with a slightly different naming convention and missed that sweep, leaving `TestFilesToSync_Everything*` bleeding ~10min combined - the residual ~10min still visible in the table above. What this PR does ----------------- Mechanical `.com -> .test` swap across all remaining test fixtures that set a host. Same low-risk pattern as #5125. Includes the `cmd/root/bundle_test.go` set (x.com / a.com - both real domains), the schema testdata pass files (myworkspace.com), and the cmd/auth testdata fixture plus its consumers. Also adds a rule to AGENTS.md / CLAUDE.md so future test fixtures use a non-resolving TLD by default and we don't have to relearn this through a CI slowdown. Co-authored-by: Isaac
Make the rule self-explaining: .test (and .example/.invalid/.localhost) are reserved by RFC 2606 §2 specifically so they don't resolve in real DNS, which is exactly the property we want for test fixture hosts. Co-authored-by: Isaac
simonfaltum
approved these changes
May 6, 2026
janniklasrose
added a commit
that referenced
this pull request
May 8, 2026
## Changes Rename bare-hostname test fixtures (no TLD) to use `.test`: - `libs/databrickscfg/profile/testdata/databrickscfg` — `https://default`, `https://query/?o=1234`, `https://foo` - `libs/databrickscfg/profile/testdata/sample-home/.databrickscfg` — `https://default` - `libs/databrickscfg/loader_test.go` — `Host:` fields and the `multiple profiles matched` error-message assertion - `libs/databrickscfg/ops_test.go` — only the two cases that match the renamed shared fixture profiles - `cmd/labs/project/testdata/installed-in-home/.databrickscfg` — `https://abc` `https://accounts.cloud.databricks.com` and `https://spog[-dup].databricks.com` in the same files are intentionally left as-is. ## Why Follow-up to #5189. That sweep replaced real `.com` hosts; this catches the bare-hostname variants the `.com` filter missed. The SDK well-known endpoint resolver treats single-label hosts as real DNS lookups, so these fixtures stall ~3s each on the Windows CI runner. From `test-output.json` of run 25496981743 (win-tf): | Test | linux | windows | |----------------------------------------------------------|--------|---------| | `TestLoaderSkipsExistingAuth` | 0.01s | 3.04s | | `TestLoaderSkipsExplicitAuthType` | 0.01s | 2.79s | | `TestLoaderMatchingHostWithQuery` | 0.03s | 2.78s | | `TestLoaderMatchingHost` | 0.00s | 2.76s | | `TestLoaderSkipsNonExistingConfigFile` | 0.01s | 2.75s | | `TestLoaderSkipsNoMatchingHost` | 0.01s | 1.27s | | `cmd/labs/project: TestRunningCommand` | 0.10s | 2.98s | | `cmd/labs/project: TestRenderingTable` | 0.03s | 2.92s | | `cmd/labs/project: TestRunningBlueprintEchoProfileWrong` | 0.01s | 2.76s | ~25s per Windows job × 2 engines = ~50s expected to come back. Scope is limited to fixtures that actually trigger the resolver. Other `https://foo`-style strings in `ops_test.go` are in tempdir-only tests that never hit DNS and remain untouched to keep the diff focused. ## Tests - `go test ./libs/databrickscfg/... ./cmd/labs/project/...` passes locally. - `./task fmt`, `./task checks`, `./task lint` all clean. - Watch the Windows-tf job duration on this PR — expect ~25–50s improvement. _This PR was written by Claude Code._
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #5125 (Replace bar.com with bar.test in tests). The same footgun lived in many other test fixtures using real-resolving
.comdomains inConfig.Host,databricks.yml'sworkspace.host, and.databrickscfg:foo.com,test.com,test2.com,myhost.com,myworkspace.com,x.com,a.com,www.host[12].com,other.host.com.Why this surfaced now
The SDK started calling
resolveHostMetadataon everyConfiginit in databricks/databricks-sdk-go#1542 (released in SDK v0.123.0). The CLI pulled this in via #4799 (bump v0.120.0 → v0.126.0). The resolver performs an HTTP fetch against the host's well-known endpoint with a retry loop. For non-resolving hosts (.test,.invalid, etc.) it fails fast. For real domains it depends on the network: a quick TCP RST or 404 also returns fast; a slow handshake or silent drop triggers a 5-minute retry window.Timeline of
task test (linux, direct)job duration:The 0.127.0 / 0.128.0 bumps did not move the needle — the resolver was already live since the v0.126.0 bump in #4799. The +10min jump appeared on Apr 29, which lines up with the GitHub Actions runner network changing behavior toward these external domains — most likely a firewall or egress-filtering change that stopped fast-failing the lookups, turning every test that set a real-domain
Config.Hostinto a 5-minute stall.#5125 fixed the bar.com family and recovered most of the regression.
foo.comlived in a separate file with a slightly different naming convention and missed that sweep, leavingTestFilesToSync_Everything*bleeding ~10min combined — the residual still visible above.What this PR does
.com→.testswap across all remaining test fixtures that set a host. Same low-risk pattern as Replace bar.com with bar.test in tests #5125.cmd/root/bundle_test.goset (x.com/a.com— both real domains), the schema testdata pass files (myworkspace.com), and thecmd/authtestdata fixture plus its consumers.AGENTS.md/CLAUDE.mdciting RFC 2606 §2 so future test fixtures use a non-resolving TLD by default.Test plan
go build ./...cleanbundle/config/validate,bundle/run,libs/auth,libs/cmdctx,libs/template,cmd/auth,cmd/root,bundle/deploy/terraform,bundle/internal/schematask test (linux, direct)duration on this PR — expect recovery to the ~470s baseline